Balancing Exploration and Exploitation in Classical Planning
نویسندگان
چکیده
Successful heuristic search planners for satisficing planning like FF or LAMA are usually based on one or more best first search techniques. Recent research has led to planners like Arvand, Roamer or Probe, where novel techniques like Monte-Carlo Random Walks extend the traditional exploitation-focused best first search by an exploration component. The UCT algorithm balances these contradictory incentives and has shown tremendous success in related areas of sequential decision making but has never been applied to classical planning yet. We make up for this shortcoming by applying the Trial-based Heuristic Tree Search framework to classical planning. We show how to model the best first search techniques Weighted A and Greedy Best First Search with only three ingredients: action selection, initialization and backup function. Then we use THTS to derive four versions of the UCT algorithm that differ in the used backup functions. The experimental evaluation shows that our main algorithm, GreedyUCT, outperforms all other algorithms presented in this paper, both in terms of coverage and quality.
منابع مشابه
Exploration in relational domains for model-based reinforcement learning
A fundamental problem in reinforcement learning is balancing exploration and exploitation. We address this problem in the context of model-based reinforcement learning in large stochastic relational domains by developing relational extensions of the concepts of the E and R-MAX algorithms. Efficient exploration in exponentially large state spaces needs to exploit the generalization of the learne...
متن کاملModel based Bayesian Exploration
Reinforcement learning systems are often concerned with balancing exploration of untested actions against exploitation of actions that are known to be good. The benefitof exploration can be estimated using the classical notion of Value of Information — the expected improvement in future decision quality arising from the information acquired by exploration. Estimating this quantity requires an a...
متن کاملBest-First Width Search: Exploration and Exploitation in Classical Planning
It has been shown recently that the performance of greedy best-first search (GBFS) for computing plans that are not necessarily optimal can be improved by adding forms of exploration when reaching heuristic plateaus: from random walks to local GBFS searches. In this work, we address this problem but using structural exploration methods resulting from the ideas of width-based search. Width-based...
متن کاملBalancing Exploration and Exploitation in Alliance Formation
Do firms balance exploration and exploitation in their alliance formation decisions and, if so, why and how? We argue that absorptive capacity and organizational inertia impose conflicting pressures for exploration and exploitation with respect to the value chain function of alliances, the attributes of partners, and partners’ network positions. Although path dependencies reinforce either explo...
متن کاملMultiobjective Automatic Parameter Calibration of a Hydrological Model
This study proposes variable balancing approaches for the exploration (diversification) and exploitation (intensification) of the non-dominated sorting genetic algorithm-II (NSGA-II) with simulated binary crossover (SBX) and polynomial mutation (PM) in the multiobjective automatic parameter calibration of a lumped hydrological model, the HYMOD model. Two objectives—minimizing the percent bias a...
متن کامل